Welcome to the third practical on text mining! The aim of this practical is to enhance your understanding in sentiment analysis and learn three different ways of performing sentiment analysis. Note that we will evaluate their performance using all data for the unsupervised models. And for the supervised models, we will train the models in 80% of the data and evaluate their performance in the remaining 20%.
In this practical, we will focus on the following methods:
In this practical, we make use of the following packages:
library(tm)
library(text2vec)
library(tidyverse)
library(tidytext)
library(ggplot2)
library(caret)
library(rpart)
library(rpart.plot)We are going to use one data set movie_review in this
practical:
text2vec package. This data set consists of 5000 IMDB movie
reviews, specially selected for sentiment analysis. The sentiment of the
reviews is binary, meaning an IMDB rating < 5 results in a sentiment
score of 0, and a rating >=7 has a sentiment score of 1. No
individual movie has more than 30 reviews. Load this data set and
convert it to a dataframe.# load an example dataset from text2vec
data("movie_review")
as_tibble(movie_review)knitr::knit_exit()